Addressing Knowledge Discovery Problems in a Multistrategy Framework

نویسنده

  • Kenneth A. Kaufman
چکیده

ion with equal frequency. Cognitive scientists speak of bask level nodes within a generalization hierarchy whose children share many sensorially recognizable commonalities (Rosch et al. 1976). Other factors that help to characterize a node’s utility compared to those at higher or lower levels of abstraction are concept typicality (bow common are the features of this concept among its sibling concepts), and the coutext in which the concept is being used (gnmesch 1988; Kubat, Bratko & Michalski 1996). Each of these factors affects the selection of a particular level of abstraction in m~i~ clescriptkms. By encoding the relative utility of the nodes into the knowledge representation of a discovery system, the system can wesent discovered knowledge that focuses on the mm~ useful levels of abstraction wheat possible. We will typically prefer to see the classification roles "An From: Proceedings of the Third International Conference on Multistrategy Learning. Copyright © 1996, AAAI (www.aaai.org). All rights reserved. object belongs to Class 1 if it is an apple" or "An object belongs to Class 1 if it is a fruit" instead of "An object belongs to Class 1 if it is a red delicious" of "An object belongs to Class I if it is food." Another important feature in INLEN’s tructured dam representation is its ability to wcfk with multiple views of the data (Km~frmm & Mi¢lmlski, 1996b). Consider application in which a marketing specialist is trying to target the cnst~n~rs who are most likely, to be inlerested in a new product. A oustomerdatabase may have extensive inf~ou including the model of the automobile driven by the customer, Au~obiies may in turn be Ol’~aniT~l according to manufacun~,:type (e,g., sedan, sporm car, statiou wagon), ltmce, year, etc. One may not know prior to a kamin8 taskwhich classificatiou view will provide the most concise and useful knowledge. Also, more than one view may generate the best rules for detenni~g whether a customer is. likely to buy the product. A decision rule, for example, may include the condition that likely buyer will often drive European station wagons, while the specificatiou of the type of vehicle or manufacturer alone may not give an accurate representation of the customer’s I~Opel~ties. I~.l~q’S gene~liTm_ion engine aotommie~lly selects from the possible ways of expressing a set of attn~te values (in.this case the automobile medeis.that are European.station wagons)a concise representatiou of the knowledge. INLEN’s knowledge generation operators have also been e~hRnced to cope with the problem of incomplete data sets. Expe~’iments with sparse ~ exposed some of the limi~ion$ of these opelators, leading to modificgfious in which the logical implications of unknowns are more rigoronsly encoded into the learning algoritlnn. For example, in the AQ fmnily of programs, attributes with unknown values are represented as .having all of their values under consideration. A stipulation in earlier versionsof the .program :required that gene~li,Atious of examples with unknown values for some attributes maintain consistency by permitting those amamtes to lake any value. While this wil! guarantee rigorous consistency with the data, trouble arises when many examples have only a few known attribute values. Gene~liztng just a few of them together creates a siVmfion in which noshing can he assumed about Any featore. As an e~le of a donmin in which this may be a real problem, intelligent agcuts, me being.developed to scan text and smnmafize it based on key words in several categories of intea~st to the user. Articles often will oaly COBtMn key words in a few of these.categedes, leading to a very empty database. The relationships amoug entries in the various categeries will often be.tenuoas ones given.that for much of the data, oue.or mine of these fields will.be empty. In such a domain a discovery program must be able to sift through the information that is present without getting lost in that which is missing. Atthe expense of some additional computational complexity, the knowledge generation operators in I~-~:N have been modified in such a way thin they can now generate knowledge consistent with what facts have been made available, without ~dhering to the assmnption tlwt unknowns must be gcueralized to take on all values. With the relaxation of this condition, a learning engine.osndete~ Inore Szlbstantial relati0o~hins. Another aspect of this research approw,~s the.problem ̄ of knowledgeextraction from distributed sources. The INSIGHT program (Ribeim, Kaufma~ & Kerschberg. 1995) is being developed as aa operator to perform knowledg~dtiven search through multiple databases, The combinatorial cost of combining separate data sets is avoided through INSIGHT’s mechanism of finding relationships .between a database and the knowledge generated frem mmther database, In order to facilitate the interface with INSIGHT and Other opera~rs, ]NLEN mRinmins informatiou on the database reco~ relevant to each rule in its knowledge base. As was shown above in the population growth example, a.use for.these linh i8 tO enabie Ihe idcutifm:atiou of significant cinstets of exceptious. Another use is to allow the measurement of the degree of match between a role and a set of recot~ in a second database mmehing a given set of mnditious. A high degree of retch my suggest a ]ink~e between the two co~-pts. For example, a role describing the clima~ of a country may be crossreferenced with a database of natural di~stets. Ifa class of nateraldisaster occurred in a set of countries gimilar to I~ set of countries covered by the cllm~ rule, it may suggest a relatioaship betwem the clim~ and that kind of disaster.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multistrategy Learning Approach to Flexible Knowledge Organization and Discovery

1 Also with Lockheed Martin Federal Systems, Gaithersburg, MD. 2 Also with Science Applications International Corp., Tysons Corner, VA. Abstract Properly organizing knowledge so that it can be managed often requires the acquisition of patterns and relations from large, distributed, heterogeneous databases. The employment of an intelligent and automated KDD (Knowledge Discovery in Databases) pro...

متن کامل

AqBC: A Multistrategy Approach for Constructive Induction

In order to obtain potentially interesting patterns and relations from large, distributed, heterogeneous databases, it is essential to employ an intelligent and automated KDD (Knowledge Discovery in Databases) process. One of the most important methodologies is an integration of diverse learning strategies that cooperatively performs a variety of techniques and achieves high quality knowledge. ...

متن کامل

Multistrategy Data Exploration Using the INLEN System: Recent Advances

Recent advances in the development of the INLEN system for multistrategy data exploration are briefly reviewed. These advances include the development of a meta-level language for data mining and knowledge discovery, called knowledge generation language (KGL), and the employment of a new type of attributes, called structured attributes. These features are illustrated by an example concerned wit...

متن کامل

A Methodology and Life Cycle Model for Data Mining and Knowledge Discovery in Precision Agriculture

This paper presents a methodology for data mining and knowledge discovery in large, distributed and heterogeneous databases. In order to obtain potentially interesting patterns, relationships, and rules from such large and heterogeneous data collections, it is essential that a methodology be developed to take advantage of the suite of existing methods and tools available for data mining and kno...

متن کامل

An Inference-based Framework for Multistrategy Learning

This chapter describes a general framework for multistrategy learning. One idea of this framework is to view learning as an inference process and to integrate the elementary inferences that are employed by the single-strategy learning methods. Another idea is to base learning on building and generalizing a special type of explanation structure called plausible justification tree which is compos...

متن کامل

Data Mining and Knowledge Discovery: A Review of Issues and a Multistrategy Approach

An enormous proliferation of databases in almost every area of human endeavor has created a great demand for new, powerful tools for turning data into useful, task-oriented knowledge. In efforts to satisfy this need, researchers have been exploring ideas and methods developed in machine learning, pattern recognition, statistical data analysis, data visualization, neural nets, etc. These efforts...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001